2,713 research outputs found

    The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

    Full text link
    We introduce dataset multiplicity, a way to study how inaccuracies, uncertainty, and social bias in training datasets impact test-time predictions. The dataset multiplicity framework asks a counterfactual question of what the set of resultant models (and associated test-time predictions) would be if we could somehow access all hypothetical, unbiased versions of the dataset. We discuss how to use this framework to encapsulate various sources of uncertainty in datasets' factualness, including systemic social bias, data collection practices, and noisy labels or features. We show how to exactly analyze the impacts of dataset multiplicity for a specific model architecture and type of uncertainty: linear models with label errors. Our empirical analysis shows that real-world datasets, under reasonable assumptions, contain many test samples whose predictions are affected by dataset multiplicity. Furthermore, the choice of domain-specific dataset multiplicity definition determines what samples are affected, and whether different demographic groups are disparately impacted. Finally, we discuss implications of dataset multiplicity for machine learning practice and research, including considerations for when model outcomes should not be trusted.Comment: 25 pages, 8 figures. Accepted at FAccT '2

    Monitoring of dabigatran anticoagulation and its reversal in vitro by thrombelastography.

    Get PDF
    Dabigatran etexilate, a pro-drug of a direct thrombin inhibitor, was approved a few years ago for non-valvular atrial fibrillation and deep venous thrombosis. Rapid monitoring of the dabigatran level is essential in trauma and bleeding patients but the traditional plasma-based assays may not sufficiently display the effect. Furthermore, no antidote exists and reversal of the anticoagulant effect is impossible or difficult. The present study investigated the in vitro effect of dabigatran on whole blood thromboelastography (TEG) and its reversal by recombinant activated factor VII and prothrombin complex concentrate

    ALMA 400 pc Imaging of a z = 6.5 Massive Warped Disk Galaxy

    Get PDF
    © 2023. The Author(s). Published by the American Astronomical Society. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/We present 0.″075 (≈400 pc) resolution Atacama Large Millimeter/submillimeter Array (ALMA) observations of the [C ii] and dust continuum emission from the host galaxy of the z = 6.5406 quasar, P036+03. We find that the emission arises from a thin, rotating disk with an effective radius of 0.″21 (1.1 kpc). The velocity dispersion of the disk is consistent with a constant value of 66.4 ± 1.0 km s−1, yielding a scale height of 80 ± 30 pc. The [C ii] velocity field reveals a distortion that we attribute to a warp in the disk. Modeling this warped disk yields an inclination estimate of 40.°4 ± 1.°3 and a rotational velocity of 116 ± 3 km s−1. The resulting dynamical mass estimate of (1.96 ± 0.10) × 1010 M ⊙ is lower than previous estimates, which strengthens the conclusion that the host galaxy is less massive than expected based on local scaling relations between the black hole mass and the host galaxy mass. Using archival MUSE Lyα observations, we argue that counterrotating halo gas could provide the torque needed to warp the disk. We further detect a region with excess (15σ) dust continuum emission, which is located 1.3 kpc northwest of the galaxy’s center and is gravitationally unstable (Toomre Q < 0.04). We posit this is a star-forming region whose formation was triggered by the warp because the region is located within a part of the warped disk where gas can efficiently lose angular momentum. The combined ALMA and MUSE imaging provides a unique view of how gas interactions within the disk–halo interface can influence the growth of massive galaxies within the first billion years of the Universe.Peer reviewe

    Changes in endosymbiont complexity drive host-level compensatory adaptations in cicadas

    Get PDF
    Copyright © 2018 Campbell et al. For insects that depend on one or more bacterial endosymbionts for survival, it is critical that these bacteria are faithfully transmitted between insect generations. Cicadas harbor two essential bacterial endosymbionts, "Candidatus Sulcia muelleri" and "Candidatus Hodgkinia cicadicola." In some cicada species, Hodgkinia has fragmented into multiple distinct but interdependent cellular and genomic lineages that can differ in abundance by more than two orders of magnitude. This complexity presents a potential problem for the host cicada, because low-abundance but essential Hodgkinia lineages risk being lost during the symbiont transmission bottleneck from mother to egg. Here we show that all cicada eggs seem to receive the full complement of Hodgkinia lineages, and that in cicadas with more complex Hodgkinia this outcome is achieved by increasing the number of Hodgkinia cells transmitted by up to 6-fold. We further show that cicada species with varying

    Principal component and factor analytic models in international sire evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Interbull is a non-profit organization that provides internationally comparable breeding values for globalized dairy cattle breeding programmes. Due to different trait definitions and models for genetic evaluation between countries, each biological trait is treated as a different trait in each of the participating countries. This yields a genetic covariance matrix of dimension equal to the number of countries which typically involves high genetic correlations between countries. This gives rise to several problems such as over-parameterized models and increased sampling variances, if genetic (co)variance matrices are considered to be unstructured.</p> <p>Methods</p> <p>Principal component (PC) and factor analytic (FA) models allow highly parsimonious representations of the (co)variance matrix compared to the standard multi-trait model and have, therefore, attracted considerable interest for their potential to ease the burden of the estimation process for multiple-trait across country evaluation (MACE). This study evaluated the utility of PC and FA models to estimate variance components and to predict breeding values for MACE for protein yield. This was tested using a dataset comprising Holstein bull evaluations obtained in 2007 from 25 countries.</p> <p>Results</p> <p>In total, 19 principal components or nine factors were needed to explain the genetic variation in the test dataset. Estimates of the genetic parameters under the optimal fit were almost identical for the two approaches. Furthermore, the results were in a good agreement with those obtained from the full rank model and with those provided by Interbull. The estimation time was shortest for models fitting the optimal number of parameters and prolonged when under- or over-parameterized models were applied. Correlations between estimated breeding values (EBV) from the PC19 and PC25 were unity. With few exceptions, correlations between EBV obtained using FA and PC approaches under the optimal fit were ≥ 0.99. For both approaches, EBV correlations decreased when the optimal model and models fitting too few parameters were compared.</p> <p>Conclusions</p> <p>Genetic parameters from the PC and FA approaches were very similar when the optimal number of principal components or factors was fitted. Over-fitting increased estimation time and standard errors of the estimates but did not affect the estimates of genetic correlations or the predictions of breeding values, whereas fitting too few parameters affected bull rankings in different countries.</p

    Cost-effectiveness of tenofovir gel in urban South Africa: model projections of HIV impact and threshold product prices.

    Get PDF
    BACKGROUND: There is urgent need for effective HIV prevention methods that women can initiate. The CAPRISA 004 trial showed that a tenofovir-based vaginal microbicide had significant impact on HIV incidence among women. This study uses the trial findings to estimate the population-level impact of the gel on HIV and HSV-2 transmission, and price thresholds at which widespread product introduction would be as cost-effective as male circumcision in urban South Africa. METHODS: The estimated 'per sex-act' HIV and HSV-2 efficacies were imputed from CAPRISA 004. A dynamic HIV/STI transmission model, parameterised and fitted to Gauteng (HIV prevalence of 16.9% in 2008), South Africa, was used to estimate the impact of gel use over 15 years. Uptake was assumed to increase linearly to 30% over 10 years, with gel use in 72% of sex-acts. Full economic programme and averted HIV treatment costs were modelled. Cost per DALY averted is estimated and a microbicide price that equalises its cost-effectiveness to that of male circumcision is estimated. RESULTS: Using plausible assumptions about product introduction, we predict that tenofovir gel use could lead to a 12.5% and 4.9% reduction in HIV and HSV-2 incidence respectively, by year 15. Microbicide introduction is predicted to be highly cost-effective (under 300perDALYaverted),thoughthedosepricewouldneedtobejust300 per DALY averted), though the dose price would need to be just 0.12 to be equally cost-effective as male circumcision. A single dose or highly effective (83% HIV efficacy per sex-act) regimen would allow for more realistic threshold prices (0.25and0.25 and 0.33 per dose, respectively). CONCLUSIONS: These findings show that an effective coitally-dependent microbicide could reduce HIV incidence by 12.5% in this setting, if current condom use is maintained. For microbicides to be in the range of the most cost-effective HIV prevention interventions, product costs will need to decrease substantially

    The Kinetics of Specific Immune Responses in Rhesus Monkeys Inoculated with Live Recombinant BCG Expressing SIV Gag, Pol, Env, and Nef Proteins

    Get PDF
    AbstractDevelopment of an effective preventive or therapeutic vaccine against HIV-1 is an important goal in the fight against AIDS. Effective virus clearance and inhibition of spread to target organs depends principally on the cellular immune response. Therefore, a vaccine against HIV-1 should elicit virus-specific cytotoxic lymphocyte (CTL) responses to eliminate the virus during the cell-associated stages of its life cycle. The vaccine should also be capable of inducing immunity at the mucosal surfaces, the primary route of transmission. Recombinant Bacille Calmette–Guérin (BCG) expressing viral proteins offers an excellent candidate vaccine in view of its safety and ability to persist intracellularly, resulting in the induction of long-lasting immunity and stimulation of the cellular immune response. BCG can be administered orally to induce HIV-specific immunity at the mucosal surfaces. The immunogenicity of four recombinant BCG constructs expressing simian immunodeficiency virus (SIV) Gag, Pol, Env, and Nef proteins was tested in rhesus macaques. A single simultaneous inoculation of all four recombinants elicited SIV-specific IgA and IgG antibody, and cellular immune responses, including CTL and helper T cell proliferation. Our results demonstrate that BCG recombinant vectors can induce concomitant humoral and cellular immune responses to the major proteins of SIV

    Principal component approach in variance component estimation for international sire evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The dairy cattle breeding industry is a highly globalized business, which needs internationally comparable and reliable breeding values of sires. The international Bull Evaluation Service, Interbull, was established in 1983 to respond to this need. Currently, Interbull performs multiple-trait across country evaluations (MACE) for several traits and breeds in dairy cattle and provides international breeding values to its member countries. Estimating parameters for MACE is challenging since the structure of datasets and conventional use of multiple-trait models easily result in over-parameterized genetic covariance matrices. The number of parameters to be estimated can be reduced by taking into account only the leading principal components of the traits considered. For MACE, this is readily implemented in a random regression model.</p> <p>Methods</p> <p>This article compares two principal component approaches to estimate variance components for MACE using real datasets. The methods tested were a REML approach that directly estimates the genetic principal components (direct PC) and the so-called bottom-up REML approach (bottom-up PC), in which traits are sequentially added to the analysis and the statistically significant genetic principal components are retained. Furthermore, this article evaluates the utility of the bottom-up PC approach to determine the appropriate rank of the (co)variance matrix.</p> <p>Results</p> <p>Our study demonstrates the usefulness of both approaches and shows that they can be applied to large multi-country models considering all concerned countries simultaneously. These strategies can thus replace the current practice of estimating the covariance components required through a series of analyses involving selected subsets of traits. Our results support the importance of using the appropriate rank in the genetic (co)variance matrix. Using too low a rank resulted in biased parameter estimates, whereas too high a rank did not result in bias, but increased standard errors of the estimates and notably the computing time.</p> <p>Conclusions</p> <p>In terms of estimation's accuracy, both principal component approaches performed equally well and permitted the use of more parsimonious models through random regression MACE. The advantage of the bottom-up PC approach is that it does not need any previous knowledge on the rank. However, with a predetermined rank, the direct PC approach needs less computing time than the bottom-up PC.</p

    Massive migration from the steppe is a source for Indo-European languages in Europe

    Full text link
    We generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost four hundred thousand polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of western and far eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ~8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary, and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ~24,000 year old Siberian6 . By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe
    corecore